-
Notifications
You must be signed in to change notification settings - Fork 497
feat(otel): adding support for OpenTelemetry logs #4350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4326458f1e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| config := oteltrace.SpanContextConfig{ | ||
| TraceID: traceID, | ||
| SpanID: spanID, | ||
| TraceFlags: oteltrace.FlagsSampled, // DD spans are sampled by default | ||
| Remote: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map DD sampling priority to OTel TraceFlags
This always sets TraceFlagsSampled, so a Datadog trace with sampling priority ≤ 0 (not kept) will still produce an OTel span context marked as sampled. In that case, OTLP log records will look sampled and downstream systems may try to correlate them to traces that will never be sent, which can skew sampling-based behavior. Consider deriving the flag from ddCtx.SamplingPriority() (or equivalent) instead of hard‑coding sampled.
Useful? React with 👍 / 👎.
| func InitGlobalLoggerProvider(ctx context.Context) error { | ||
| var err error | ||
| globalLoggerProviderOnce.Do(func() { | ||
| globalLoggerProviderMu.Lock() | ||
| defer globalLoggerProviderMu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow InitGlobalLoggerProvider to retry after failure
Initialization is guarded by sync.Once, but failures inside the Do block still consume the once. If buildResource or newOTLPExporter fails (e.g., transient misconfiguration), subsequent calls return nil without attempting to initialize again, leaving the provider permanently nil for the process. Consider resetting the once on failure or persisting the error so callers can retry or get a consistent error.
Useful? React with 👍 / 👎.
| func StartIfEnabled(ctx context.Context) error { | ||
| cfg := config.Get() | ||
| if !cfg.LogsOtelEnabled() { | ||
| log.Debug("DD_LOGS_OTEL_ENABLED=false, skipping OTel LoggerProvider initialization") | ||
| return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wire StartIfEnabled into tracer lifecycle
This helper is defined but has no call sites in the repository, so setting DD_LOGS_OTEL_ENABLED=true never actually initializes the LoggerProvider unless users call StartIfEnabled manually. That makes the feature flag inert in typical tracer startup/shutdown flows. Consider invoking StartIfEnabled/StopIfEnabled from tracer initialization and shutdown to honor the env var.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenTelemetry Logs support follows the same manual-initialization model as OTel Metrics in dd-trace-go. This means:
- StartIfEnabled() / StopIfEnabled() are user-facing APIs, not internal hooks
- Users must explicitly call these functions to initialize the logs pipeline
- tracer.Start() and tracer.Stop() deliberately do NOT touch logs
BenchmarksBenchmark execution time: 2026-01-13 22:59:48 Comparing candidate commit 94efc13 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 156 metrics, 8 unstable metrics. |
|
kakkoyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I have just some nits.
| // DD environment variables for agent configuration | ||
| envDDTraceAgentURL = "DD_TRACE_AGENT_URL" | ||
| envDDAgentHost = "DD_AGENT_HOST" | ||
| envDDTraceAgentPort = "DD_TRACE_AGENT_PORT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my local editor this is not used.
| @@ -0,0 +1,76 @@ | |||
| // Unless explicitly stated otherwise all files in this repository are licensed | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rename this file, so that it ends wit _test.go Or simply move to the exporter_test.go
I assumed this is only used in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is just a helper file used for the tests.
|
|
||
| // IsRecording returns true if the span is recording. | ||
| func (w *ddSpanWrapper) IsRecording() bool { | ||
| // DD spans are always recording if not finished |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // DD spans are always recording if not finished | |
| // This always returns true because DD spans don't expose a "finished" state | |
| // through the public API. In practice, this is acceptable because logs are | |
| // typically emitted while spans are active (before Finish is called). |
| InitialInterval: 1 * time.Second, | ||
| MaxInterval: 30 * time.Second, | ||
| MaxElapsedTime: 5 * time.Minute, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create constants for this and document the choices.
| InitialInterval: 1 * time.Second, | |
| MaxInterval: 30 * time.Second, | |
| MaxElapsedTime: 5 * time.Minute, | |
| InitialInterval: httpRetryInitialInterval, | |
| MaxInterval: httpRetryMaxInterval, | |
| MaxElapsedTime: httpRetryMaxElapsedTime, |
| InitialInterval: 5 * time.Second, | ||
| MaxInterval: 30 * time.Second, | ||
| MaxElapsedTime: 5 * time.Minute, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| InitialInterval: 5 * time.Second, | |
| MaxInterval: 30 * time.Second, | |
| MaxElapsedTime: 5 * time.Minute, | |
| InitialInterval: grpcRetryInitialInterval, | |
| MaxInterval: grpcRetryMaxInterval, | |
| MaxElapsedTime: grpcRetryMaxElapsedTime, |
| sdklog.Exporter | ||
| telemetry *LogsExportTelemetry | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Compile-time check that telemetryExporter implements sdklog.Exporter. | |
| var _ sdklog.Exporter = (*telemetryExporter)(nil) |
| log.Debug("Shutting down OTel LoggerProvider") | ||
|
|
||
| // Use a timeout context to avoid blocking indefinitely | ||
| ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the above
| ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) | |
| ctx, cancel := context.WithTimeout(context.Background(), shutdownTimeout) |
| // msConfig holds a milliseconds configuration value with its origin. | ||
| type msConfig struct { | ||
| value int | ||
| origin telemetry.Origin | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think configValue is a better name for this.
| // msConfig holds a milliseconds configuration value with its origin. | |
| type msConfig struct { | |
| value int | |
| origin telemetry.Origin | |
| } | |
| // configValue holds a configuration value (typically in milliseconds) with its origin. | |
| // Used for telemetry reporting to track whether a value came from environment | |
| // variables or defaults. | |
| type configValue struct { | |
| value int | |
| origin telemetry.Origin | |
| } |
| func StopIfEnabled() { | ||
| provider := GetGlobalLoggerProvider() | ||
| if provider == nil { | ||
| // Not initialized, nothing to do | ||
| return | ||
| } | ||
|
|
||
| log.Debug("Shutting down OTel LoggerProvider") | ||
|
|
||
| // Use a timeout context to avoid blocking indefinitely | ||
| ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) | ||
| defer cancel() | ||
|
|
||
| if err := ShutdownGlobalLoggerProvider(ctx); err != nil { | ||
| log.Warn("Error shutting down OTel LoggerProvider: %v", err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be simplified to?:
| func StopIfEnabled() { | |
| provider := GetGlobalLoggerProvider() | |
| if provider == nil { | |
| // Not initialized, nothing to do | |
| return | |
| } | |
| log.Debug("Shutting down OTel LoggerProvider") | |
| // Use a timeout context to avoid blocking indefinitely | |
| ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) | |
| defer cancel() | |
| if err := ShutdownGlobalLoggerProvider(ctx); err != nil { | |
| log.Warn("Error shutting down OTel LoggerProvider: %v", err) | |
| } | |
| // If the provider was not initialized, this is a no-op. | |
| func StopIfEnabled() error { | |
| ctx, cancel := context.WithTimeout(context.Background(), shutdownTimeout) | |
| defer cancel() | |
| return ShutdownGlobalLoggerProvider(ctx) |
| // | ||
| // If the feature is not enabled, this function is a no-op. | ||
| // Returns an error if initialization fails when the feature is enabled. | ||
| func StartIfEnabled(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just Start is simpler, we explain everything in the comments. Same for the Stop
| func StartIfEnabled(ctx context.Context) error { | |
| func Start(ctx context.Context) error { |
What does this PR do?
WIP
DD_LOGS_OTEL_ENABLEDconfiguration and when true enables support for OpenTelemetry logs (default: false)trace_idandspan_idfor both Datadog and OpenTelemetry spansOTEL_BLRP_*environment variablesConfiguration
DD_LOGS_OTEL_ENABLED- Enable/disable (default: false)OTEL_EXPORTER_OTLP_LOGS_ENDPOINT- Custom OTLP endpointOTEL_EXPORTER_OTLP_LOGS_PROTOCOL- Protocol: http/json, http/protobuf, grpcOTEL_BLRP_*- Batch processor settingsImplementation
Motivation
The OpenTelemetry Logs API RFC explains that enabling this feature will allow customers using OpenTelemetry standards to migrate to Datadog SDKs with minimal friction while continuing to receive their logs as expected.
Reviewer's Checklist
./scripts/lint.shlocally.Unsure? Have a question? Request a review!